Day 12 - Regular expressions - Single characters

42

$ grep -E "d." examples.txt

dog

beholder

dryad

dog

aardvark

corn dog

direwolf

phase spider

undead red dragon

Spider-Man [*]

wild hog

Big Bad Wolf

As you can see grep highlights groups of two letters, all of them starting with d. The “do” in dog,

the “de” in spider, the “dr” in dragon, all these lines have one thing in common: they contain a d

followed by another symbol (a letter or a space, as happens for example in “wild hog”).

This is what the symbol . does in a regular expression. It doesn’t mean a full stop, like in the standard

punctuation usage, but “any character”. Whenever a regular expression contains a . there can be any

single character. Mind the fact that a single . matches a single character.

As you can see, then, regular expressions are simple strings, but they can contain either normal

characters (mostly letters of the alphabet, both lowercase and uppercase, and numbers) and special

ones. So far we learned about only one of the special characters, that is ., commonly called “dot” in

this context.

How do we match a proper dot in the string? Since regular expressions assign a special meaning to

some characters, when you want to use those characters for their original value you have to escape

them with a \ (backslash). So, while

$ grep -E "1.1" examples.txt

Police 101

HTTP/1.1

matches two characters “1” separated by any single character, the regular expression

$ grep -E "1\.1" examples.txt

HTTP/1.1

matches only those separated by a literal dot. Pay attention that the dot can be a punctuation mark,

a decimal point, or have any other meaning. Regular expressions don’t know anything about the

text that you are parsing, they just consider pure characters.

The other important tool that can use regular expressions is sed, that we already met twice in the

previous chapters. To activate them in sed you need to use the -r option, and this makes the search

pattern in an s/ command a regular expression.